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Inferences about the global scenario of human T-cell lymphotropic 
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Human T-cell lymphotropic virus type 1 (HTLV-1) is mainly associated with two diseases: tropical spastic para- 
paresis/HTLV-l-associated myelopathy (TSP/HAM) and adult T-cell leukaemia/lymphoma. This retrovirus infects 
five-10 million individuals throughout the world. Previously, we developed a database that annotates sequence data 
from GenBank and the present study aimed to describe the clinical, molecular and epidemiological scenarios of 
HTLV-1 infection through the stored sequences in this database. A total of 2,545 registered complete and partial 
sequences of HTLV-1 were collected and 1,967 (77.3%) of those sequences represented unique isolates. Among these 
isolates, 93% contained geographic origin information and only 39% were related to any clinical status. A total of 
1,091 sequences contained information about the geographic origin and viral subtype and 93% of these sequences 
were identified as subtype "a". Ethnicity data are very scarce. Regarding clinical status data, 29% of the sequences 
were generated from TSP/HAM and 67.8%from healthy carrier individuals. Although the data mining enabled some 
inferences about specific aspects of HTLV-1 infection to be made, due to the relative scarcity of data of available 
sequences, it was not possible to delineate a global scenario of HTLV-1 infection. 

Key words: HTLV-1 - data mining - HTLV-1 database 



Human T-cell lymphotropic virus type 1 (HTLV-1) was 
the first described human retrovirus (Poiesz et al. 1980). 
This retrovirus is the causative agent of tropical spastic 
paraparesis/HTLV-l-associated myelopathy (TSP/HAM) 
(Gessain et al. 1985), adult T-cell leukaemia/lymphoma 
(ATL) (Yoshida et al. 1982) and other inflammatory dis- 
eases such as HTLV-l-associated infectious dermatitis 
(La Grenade et al. 1998) and HTLV-l-associated uveitis 
(Mochizuki et al. 1992). However, the pathogenesis of 
some clinical manifestations is not yet fully understood. 

Epidemiological data show that HTLV-1 has a world- 
wide distribution and it is estimated that five-10 mil- 
lion people are infected (Gessain & Cassar 2012). This 
infection is endemic in southwestern Japan (Mueller et 
al. 1996), sub-Saharan Africa (Gessain & de The 1996), 
regions of the Caribbean (Hanchard et al. 1990) and mi- 
nor areas in Iran, Melanesia (Mueller 1991) and Brazil 
(Galvao-Castro et al. 1997). 

Regardless, HTLV-1 epidemiology stillpresents many 
challenges. Virus prevalence rates have been correlated 
with geographic characteristics and the social setting of 
destitute populations. However, these populations are 
not frequently the target of great public and government 
interest (Galvao-Castro et al. 1997). Molecular studies, 
especially during the late decade, have contributed to 
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the acquisition of knowledge about virus epidemiology 
and the molecular characteristics. Furthermore, a great 
amount of viral sequences are generated from these mo- 
lecular studies because of this appropriate data manage- 
ment and data mining can provide additional consistent 
information about HTLV-1 infection. 

In response to the need of obtaining more informa- 
tion about the already generated and available HTLV-1 
sequences, HTLV-1 Molecular Epidemiology Database 
(htlvldb.fiocruz.bahia.br) was developed (Araujo et al. 
2012). This database contains information that can sup- 
port our understanding of viral pathogenesis, the route 
of transmission, polymorphisms, epidemiology, geno- 
type-phenotype relationships, geographic distribution 
and viral evolution. Therefore, the purpose of the pres- 
ent study was to assess the different types of information 
deposited in HTLV-1 Molecular Epidemiology Database 
to describe clinical, molecular and epidemiological sce- 
narios about HTLV-1 infection. 

MATERIALS AND METHODS 

This is a descriptive study about the clinical, mo- 
lecular and epidemiological data of HTLV-1 infection 
that are associated with the stored genetic sequences in 
HTLV-1 Molecular Epidemiology Database. 

All the descriptive analyses were performed using 
the search algorithm implemented at the HTLV-1 data- 
base (Araujo et al. 2012). Initially, we made a list with 
the variables (age, gender, clinical status, subtype, sub- 
group, geographic origin) that were more frequent in 
the database. We then performed combinations with the 
listed variables; for example, we searched for sequences 
with information about geographic origin, viral subtype 
and clinical status. These combinations constitute the 
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section subheadings of the Results and Discussion sec- 
tion. The HTLV-1 database allows all the search results 
to be organised as spread sheets for further analyses. 

After the search step and the generation of spread 
sheets containing the results, we used the Excel program 
to perform descriptive analyses. 

RESULTS AND DISCUSSION 

Currently, HTLV-1 Molecular Epidemiology Data- 
base stores 2,545 HTLV-1 sequences, 1,967 (77.3%) of 
which represent different isolates. These 1,967 sequenc- 
es were selected for this study and 91 (3.6%) other dis- 
tinct sequences, which did not have information about 
the viral isolate, were also included. Ultimately, 2,058 
sequences and their data were analysed. 

Geographic origin, viral subtype and clinical sta- 
tus among viral sequences - Among the 2,058 viral se- 
quences, 1,914 (93%) were associated with geographic 
origin in the GenBank notes. Fig. 1 shows the distribu- 
tion of HTLV-1 sequences among different geographic 
regions: 1.6% of the sequences originated from HTLV-1 
isolates from North America, 2.4% from Oceania, 3% 
from Europe, 3.3% from Central America, 17.7% from 
Africa, 32% from Asia and 40% from South America. 
With regard to the South America sequences, most of the 
isolates were from HTLV-1 infections in Brazil (55%) 
and Argentina (22.1%), as shown in Fig. 2. 

Although HTLV-1 infection has a cosmopolitan geo- 
graphic distribution, it has a heterogeneous distribution, 
such that Asia and South America are characterised as 
endemic areas (Proietti et al. 2005, Carneiro-Proietti et 
al. 2006). This heterogeneous distribution is also repre- 
sented in the distribution of sequences available in Gen- 
Bank and in the number of exploratory studies of HTLV-1 
infection developed in each geographical region. 

Several studies have reported a high prevalence of 
HTLV infection in Africa (Proietti et al. 2005); however, 
there are few sequences about this geographic region de- 
posited in GenBank. This profile is frequent and empha- 
sise that it is necessary to increase the use of molecular 
data as an important tool of epidemiology investigation. 
This result suggests that it is necessary to create new 
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Fig. 1: geographical distribution of 2,058 sequences stored in the 
HTLV-1 Molecular Epidemiology Database. 



health units that could be able to perform molecular di- 
agnosis and, therefore, could generate and record mo- 
lecular data about new HTLV-1 cases. Nonetheless, the 
high prevalence of HTLV infection in Asia and parts of 
South America (Carneiro-Proietti et al. 2006, Sonoda et 
al. 2011) corroborates the high amount of sequences as- 
sembled in the database. However, it is possible to ob- 
serve a lack of clinical and epidemiological information 
in the GenBank annotations. This observation shows 
that the authors should provide the maximum amount 
of information as possible because the molecular data 
could be useful for many different inferences about 
HTLV-1 infection and, therefore, useful for encouraging 
the politics of prevention. 

The search about the number of HTLV-1 sequences 
with geographic origin and viral subtype showed that 
1,091 contained information for both variables. The re- 
sults showed that 1,019 (93.4%) sequences were classi- 
fied as subtype "a" and that this subtype had a world- 
wide distribution among the sequences deposited in the 
database. This higher prevalence can be attributed to the 
fact that it is the worldwide subtype found especially in 
Japan, the Caribbean, South America and Africa. Sub- 
types "b" (4.9%), "c" (0.5%), "d" (0.6%), "e" (0.1%), "f ' 
(0.2%) and "g" (0.2%) were distributed in specific re- 
gions (Fig. 3). These subtypes are usually restricted to 
certain areas, such as subtype "c", found in Australia- 
Melanesia (Galvao-Castro et al. 1997). 

Finally, using the geographic origin and viral subtype, 
we performed a search for clinical status. It was possible 
to identify that 279 sequences had information for these 
three variables in the GenBank annotations. Regarding 
these sequences, 35.8% originated from Asia, 32.2% from 
South America, 14.3% from Africa, 10% from Central 
America, 2.8% from Europe, 4.3% from North America 
and 0.3% from Oceania. Our analyses showed that 86.3% 




Fig. 2: distribution (%) of human T-cell lymphotropic virus type 1 
sequences among the countries in South America. 
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sequences were identified as subtype "a", 12.5% sequenc- 
es were subtype "b" and 1.1% were subtype "d". Con- 
cerning the clinical status, 158 sequences (subtypes "a", 
88.6%; "b", 10.7%; "d", 0.6%) were derived from HTLV-1 
infection in healthy carrier (HC) individuals, 71 sequenc- 
es (subtypes "a", 80.2%; "b", 18.3%; "c", 1.5%) were from 
TSP/HAM individuals and 34 sequences (subtypes "a", 
91.2%; "b", 8.8%) were from ATL individuals; 19 viral 
sequences (subtypes "a", 84.2%; "b", 10.5%; "d", 5.3%) 
were related to other diseases not yet described. 

Clinical status and viral subtype and subgroup of vi- 
ral sequences from South America - Using HTLV-1 Mo- 
lecular Epidemiology Database enabled the observation 
of the overall global scenario of HTLV-1 sequences from 
South America. From all the collected sequences from 
South America, 77 were from HC individuals, 63 from 
TSP/HAM individuals and 12 from ATL individuals; 
six viral sequences from individuals with other diseases. 
However, 609 sequences did not have any information 
about the clinical status in the GenBank annotations. 
As most of the studies of HTLV-1 infection in South 
America have been developed in Argentina and Brazil, 
the greatest number of sequences and, therefore, the 
greatest number of molecular and epidemiological data 
are also generated from these regions. Because of this 
fact, it is important to emphasise the need of developing 
new exploratory studies about HTLV-1 infection in other 
countries in South America. 

Only 77 of the sequences from HTLV-1 cases of 
infection in South America had information about the 
clinical status, viral subtype and subgroup, at the same 
time, in the GenBank annotation. Among these sequenc- 
es, 18 were from TSP/HAM individuals, four sequences 
were from ATL individuals and six sequences were from 
individuals with other HTLV-associated diseases. The 
greatest number of sequences was generated from HC 



individuals (n = 49). All of the 77 sequences were identi- 
fied, in the GenBank annotation, as subtype "a", which 
was the subtype most found among infected individuals 
in South America. With regard to the subgroup classi- 
fication, 70 sequences were identified as subgroup "a" 
(n = 48 HC, n = 2 ATL, n = 15 TSP/HAM, n = 5 other 
diseases), three sequences as subgroup "b" (n = 2 TSP/ 
HAM, n = 1 ATL) and three sequences as subgroup "c" 
(n = 1 ATL, n = 1 HC, n = 1 TSP/HAM); only one se- 
quence was identified as subgroup "e" (lymphoma). 

Clinical status, age, gender, viral subtype and eth- 
nicity among viral sequences - The investigation about 
clinical status separately showed that 797 of the stored 
sequences were related to one HTLV-l-associated clini- 
cal status (TSP/HAM, 43%; ATL, 19%; HC, 32.39%; 
other diseases, 5.61%). The other HTLV-l-associated 
diseases, such as dermatitis and sicca syndrome, were 
not reported in any annotation of the stored viral se- 
quences. Using the information about the clinical status, 
we searched for additional data such as gender, age, viral 
subtype and ethnicity (Table). Approximately 15.2% of 
the sequences contained information about the infected 
patient's gender and 10.8% of the sequences provided the 
age of the infected patient. 

The data about ethnicity were very scarce, as only 41 
(5%) of the 797 stored sequences had information about 
ethnic origin in the GenBank annotation. Approximate- 
ly 78% of the sequences (n = 41) had information about 
gender, clinical status, viral subtype, ethnicity and age 
at the same time. All originated from women infected 
by one HTLV-1 subtype "a" isolate and 27.5% of these 
women were younger than 40 years old. Regarding clini- 
cal status, 29% of the sequences originated from TSP/ 
HAM women and 67.8% sequences from HC women; 
3.2% sequences were generated from infected women 
with other HTLV-l-associated diseases. 
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TABLE 



Distribution of gender, subtype and geographic origin among the clinical status: tropical spastic paraparesis/human 
T-cell lymphotropic virus type 1 (HTLV-l)-associated myelopathy (TSP/HAM), adult T-cell leukaemia/lymphoma (ATL), 

healthy carrier (HC) and other HTLV-1 associated disease 





TSP/HAM 


ATL 


HC 


Others" 




n = 343 


n= 137 


n = 261 


n = 45 


Clinical status 


n (%) 

11 ^ /O) 


n (%) 

11 ^ /O) 


n (%) 

11 ^ /O) 


n (%) 

ii v /tv 


Sequences with information about sex 


40(11.6) 


13 (9.4) 


89 (34) 


16 (35.5) 


Male 


15 (37.5) 


6 (46.2) 


33 (37) 


8(50) 


Female 


25 (62.5) 


7 (53.8) 


56 (63) 


8 (50) 


Sequences with information about subtype 


71 (20.7) 


36 (26.3) 


158 (60.5) 


19 (42.2) 


Subtype 










"a" 


57 (80.2) 


33 (91.6) 


140 (88.6) 


16 (84.2) 


"b" 


13 (18.4) 


3 (8.4) 


17 (10.8) 


2 (10.5) 


Other 


1 (1.4) 


0(0) 


1 (0.6) 


1 (5.3) 


Sequences with information about geographic origin 


320 (93.4) 


128 (93.4) 


260 (99.6) 


45 (100) 



a: sequences from patients with either infective dermatitis, histoplasmosis, stroke, seborrhoea dermatitis, leprosy, nonspastic 
paraparesis, facial palsy or leg paresis. 



A great number of studies show that some ethnically 
defined factors are likely to be associated with HTLV-1 
persistence and the development of ATL or TSP/HAM 
among HTLV-1 endemic populations. Therefore, this in- 
formation should be further investigated in HTLV-1 in- 
fection cases. Furthermore, new studies about the genet- 
ic background of infected individuals by the analysing 
polymorphic determinants of human leukocyte antigen 
alleles and their immune responsiveness to HTLV-1 are 
important points in the approach of the ethnic factors in- 
volved in HTLV-1 clustering and the disease segregation 
of ATL and TSP/HAM (Sonoda et al. 2011). 

HTLV-1 Molecular Epidemiology Database enabled 
some inferences about the specific aspects of HTLV-1 
infection. However, due to the relative scarcity of data 
about the available sequences, it was not possible to de- 
lineate a global scenario of HTLV-1 infection. Molecular 
and epidemiological data for viral sequences should be 
offered more frequently because this information can be 
used for planning public health policies. 
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